library(readr)
library(tidyverse)
Registered S3 method overwritten by 'dplyr':
  method           from
  print.rowwise_df     
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.2.0     ✔ purrr   0.3.2
✔ tibble  2.1.3     ✔ dplyr   0.8.2
✔ tidyr   0.8.3     ✔ stringr 1.4.0
✔ ggplot2 3.2.0     ✔ forcats 0.4.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
diamond <- read_csv("diamonds.csv")
Missing column names filled in: 'X1' [1]Parsed with column specification:
cols(
  X1 = col_double(),
  carat = col_double(),
  cut = col_character(),
  color = col_character(),
  clarity = col_character(),
  depth = col_double(),
  table = col_double(),
  price = col_double(),
  x = col_double(),
  y = col_double(),
  z = col_double()
)
diamond_trim <- subset(diamond, select = c("carat", "x", "y", "z"))
summary(diamond_trim)
     carat              x                y                z         
 Min.   :0.2000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.:0.4000   1st Qu.: 4.710   1st Qu.: 4.720   1st Qu.: 2.910  
 Median :0.7000   Median : 5.700   Median : 5.710   Median : 3.530  
 Mean   :0.7979   Mean   : 5.731   Mean   : 5.735   Mean   : 3.539  
 3rd Qu.:1.0400   3rd Qu.: 6.540   3rd Qu.: 6.540   3rd Qu.: 4.040  
 Max.   :5.0100   Max.   :10.740   Max.   :58.900   Max.   :31.800  

Load the diamonds.csv data set and undertake an initial exploration of the data. You will find a description of the meanings of the variables on the relevant Kaggle page

We expect the carat of the diamonds to be strong correlated with the physical dimensions x, y and z. Use ggpairs() to investigate correlations between these four variables.

library(GGally)
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2

Attaching package: ‘GGally’

The following object is masked from ‘package:dplyr’:

    nasa
ggpairs(diamond_trim)

So, we do find significant correlations. Let’s drop columns x, y and z from the dataset, in preparation to use only carat going forward.

diamond_carat <- subset(diamond)[, 1:8]

We are interested in developing a regression model for the price of a diamond in terms of the possible predictor variables in the dataset. Use ggpairs() to investigate correlations between price and the predictors (this may take a while to run, don’t worry, make coffee or something).

ggpairs(diamond_carat)

Perform further ggplot visualisations of any significant correlations you find.

diamond_carat %>%
  ggplot(aes(x = carat, y = price)) +
  geom_point()

diamond_carat %>%
  ggplot(aes(x = cut, y = price)) +
  geom_boxplot()

diamond_carat %>%
  ggplot(aes(x = color, y = price)) +
  geom_boxplot()

diamond_carat %>%
  ggplot(aes(x = clarity, y = price)) +
  geom_boxplot()

Shortly we may try a regression fit using one or more of the categorical predictors cut, clarity and color, so let’s investigate these predictors: Investigate the factor levels of these predictors. How many dummy variables do you expect for each of them?

length(unique(diamond_carat$cut))
[1] 5
length(unique(diamond_carat$color))
[1] 7
length(unique(diamond_carat$clarity))
[1] 8

Use the dummy_cols() function in the fastDummies package to generate dummies for these predictors and check the number of dummies in each case.

library(fastDummies)
diamond_carat <- dummy_cols(diamond_carat, 
                      select_columns = c("cut", "color", "clarity"), 
                      remove_first_dummy = TRUE)
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     

Going forward we’ll let R handle dummy variable creation for categorical predictors in regression fitting (remember lm() will generate the correct numbers of dummy levels automatically, absorbing one of the levels into the intercept as a reference level) First, we’ll start with simple linear regression. Regress price on carat and check the regression diagnostics.

price_carat_model <- lm(price ~ carat, 
                        data = diamond_carat)
summary(price_carat_model)

Call:
lm(formula = price ~ carat, data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-18585.3   -804.8    -18.9    537.4  12731.7 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2256.36      13.06  -172.8   <2e-16 ***
carat        7756.43      14.07   551.4   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1549 on 53938 degrees of freedom
Multiple R-squared:  0.8493,    Adjusted R-squared:  0.8493 
F-statistic: 3.041e+05 on 1 and 53938 DF,  p-value: < 2.2e-16
plot(price_carat_model)

Run a regression with one or both of the predictor and response variables log() transformed and recheck the diagnostics. Do you see any improvement?

diamond_carat <- diamond_carat %>%
  mutate(price_log = log(price)) %>%
  mutate(carat_log =log(carat))
log_log_price_carat <- lm(price_log ~ carat_log, 
                          data = diamond_carat)
summary(log_log_price_carat)

Call:
lm(formula = price_log ~ carat_log, data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.50833 -0.16951 -0.00591  0.16637  1.33793 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 8.448661   0.001365  6190.9   <2e-16 ***
carat_log   1.675817   0.001934   866.6   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2627 on 53938 degrees of freedom
Multiple R-squared:  0.933, Adjusted R-squared:  0.933 
F-statistic: 7.51e+05 on 1 and 53938 DF,  p-value: < 2.2e-16
plot(log_log_price_carat)

log_price <- lm(price_log ~ carat, 
                data = diamond_carat)
summary(log_price)

Call:
lm(formula = price_log ~ carat, data = diamond_carat)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.2844 -0.2449  0.0335  0.2578  1.5642 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.215021   0.003348    1856   <2e-16 ***
carat       1.969757   0.003608     546   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3972 on 53938 degrees of freedom
Multiple R-squared:  0.8468,    Adjusted R-squared:  0.8468 
F-statistic: 2.981e+05 on 1 and 53938 DF,  p-value: < 2.2e-16
plot(log_price)

log_carat <- lm(price ~ carat_log, 
                data = diamond_carat)
summary(log_carat)

Call:
lm(formula = price ~ carat_log, data = diamond_carat)

Residuals:
    Min      1Q  Median      3Q     Max 
-6137.4 -1495.1  -328.3  1141.9 12075.3 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6237.84      10.73   581.2   <2e-16 ***
carat_log    5836.02      15.21   383.8   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2066 on 53938 degrees of freedom
Multiple R-squared:  0.7319,    Adjusted R-squared:  0.7319 
F-statistic: 1.473e+05 on 1 and 53938 DF,  p-value: < 2.2e-16
plot(log_carat)

Let’s use log() transformations of both predictor and response. Next, experiment with adding a single categorical predictor into the model. Which categorical predictor is best? [Hint - investigate r2 values]

log_log_cut <- lm(price_log ~ carat_log + cut, 
                  data = diamond_carat)
log_log_color <- lm(price_log ~ carat_log + color, 
                  data = diamond_carat)
log_log_clarity <- lm(price_log ~ carat_log + clarity, 
                  data = diamond_carat)
summary(log_log_cut)

Call:
lm(formula = price_log ~ carat_log + cut, data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.52247 -0.16484 -0.00587  0.16087  1.38115 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.200125   0.006343 1292.69   <2e-16 ***
carat_log    1.695771   0.001910  887.68   <2e-16 ***
cutGood      0.163245   0.007324   22.29   <2e-16 ***
cutIdeal     0.317212   0.006632   47.83   <2e-16 ***
cutPremium   0.238217   0.006716   35.47   <2e-16 ***
cutVery Good 0.240753   0.006779   35.52   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2545 on 53934 degrees of freedom
Multiple R-squared:  0.9371,    Adjusted R-squared:  0.9371 
F-statistic: 1.607e+05 on 5 and 53934 DF,  p-value: < 2.2e-16
summary(log_log_color)

Call:
lm(formula = price_log ~ carat_log + color, data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.49987 -0.14888  0.00257  0.15316  1.27815 

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept)  8.572034   0.003051 2809.531  < 2e-16 ***
carat_log    1.728631   0.001814  952.727  < 2e-16 ***
colorE      -0.025460   0.003748   -6.793 1.11e-11 ***
colorF      -0.034455   0.003773   -9.132  < 2e-16 ***
colorG      -0.055399   0.003653  -15.166  < 2e-16 ***
colorH      -0.189859   0.003917  -48.468  < 2e-16 ***
colorI      -0.286928   0.004383  -65.467  < 2e-16 ***
colorJ      -0.425038   0.005417  -78.466  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2372 on 53932 degrees of freedom
Multiple R-squared:  0.9454,    Adjusted R-squared:  0.9454 
F-statistic: 1.333e+05 on 7 and 53932 DF,  p-value: < 2.2e-16
summary(log_log_clarity)

Call:
lm(formula = price_log ~ carat_log + clarity, data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.97521 -0.12085  0.01048  0.12561  1.85854 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 7.768115   0.006940 1119.25   <2e-16 ***
carat_log   1.806424   0.001514 1193.23   <2e-16 ***
clarityIF   1.114625   0.008376  133.07   <2e-16 ***
claritySI1  0.624558   0.007163   87.19   <2e-16 ***
claritySI2  0.479658   0.007217   66.46   <2e-16 ***
clarityVS1  0.820461   0.007306  112.30   <2e-16 ***
clarityVS2  0.775248   0.007197  107.72   <2e-16 ***
clarityVVS1 1.028298   0.007745  132.77   <2e-16 ***
clarityVVS2 0.979221   0.007529  130.05   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1888 on 53931 degrees of freedom
Multiple R-squared:  0.9654,    Adjusted R-squared:  0.9654 
F-statistic: 1.879e+05 on 8 and 53931 DF,  p-value: < 2.2e-16

Interpret the fitted coefficients for the levels of your chosen categorical predictor. Which level is the reference level? Which level shows the greatest difference in price from the reference level? [Hints - remember we are regressing the log(price) here, and think about what the presence of the log(carat) predictor implies. We’re not expecting a mathematical explanation]

unique(diamond_carat$clarity)
[1] "SI2"  "SI1"  "VS1"  "VS2"  "VVS2" "VVS1" "I1"   "IF"  

I1 is the reference level log_price of diamind is 1.114625 for clarity SI1 than fro reference clarity I1 for a fixed log(carat)

log of something < 1 is negative -> price is lower than reference log of something > 1 is positive -> increase in price to reference

2 Extension Try adding an interaction between log(carat) and your chosen categorical predictor. Do you think this interaction term is statistically justified?

log_log_caratclarity <- lm(price_log ~ carat_log + clarity + carat_log:clarity, 
                  data = diamond_carat)
summary(log_log_caratclarity)

Call:
lm(formula = price_log ~ carat_log + clarity + carat_log:clarity, 
    data = diamond_carat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.92773 -0.12104  0.01212  0.12465  1.51830 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           7.808102   0.007223 1080.98   <2e-16 ***
carat_log             1.528106   0.014944  102.25   <2e-16 ***
clarityIF             1.122732   0.011381   98.65   <2e-16 ***
claritySI1            0.587556   0.007465   78.71   <2e-16 ***
claritySI2            0.438797   0.007486   58.62   <2e-16 ***
clarityVS1            0.790989   0.007721  102.45   <2e-16 ***
clarityVS2            0.723109   0.007530   96.03   <2e-16 ***
clarityVVS1           1.007591   0.009506  106.00   <2e-16 ***
clarityVVS2           0.968426   0.008359  115.85   <2e-16 ***
carat_log:clarityIF   0.337116   0.017593   19.16   <2e-16 ***
carat_log:claritySI1  0.288214   0.015254   18.89   <2e-16 ***
carat_log:claritySI2  0.258795   0.015437   16.76   <2e-16 ***
carat_log:clarityVS1  0.300307   0.015395   19.51   <2e-16 ***
carat_log:clarityVS2  0.250187   0.015237   16.42   <2e-16 ***
carat_log:clarityVVS1 0.301955   0.016317   18.51   <2e-16 ***
carat_log:clarityVVS2 0.321665   0.015717   20.47   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1877 on 53924 degrees of freedom
Multiple R-squared:  0.9658,    Adjusted R-squared:  0.9658 
F-statistic: 1.014e+05 on 15 and 53924 DF,  p-value: < 2.2e-16

Not justified, R_squared still pretty much the same

Find and plot an appropriate visualisation to show the effect of this interaction

coplot(price_log ~ carat_log | clarity, 
       panel = function(x, y, ...){
         points(x, y)
         abline(lm(y ~ x), col = "blue")
       }, 
       overlap = 0.2,
       data = diamond_carat)

LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKYGBge3J9CmxpYnJhcnkocmVhZHIpCmxpYnJhcnkodGlkeXZlcnNlKQpgYGAKCmBgYHtyfQpkaWFtb25kIDwtIHJlYWRfY3N2KCJkaWFtb25kcy5jc3YiKQpkaWFtb25kX3RyaW0gPC0gc3Vic2V0KGRpYW1vbmQsIHNlbGVjdCA9IGMoImNhcmF0IiwgIngiLCAieSIsICJ6IikpCnN1bW1hcnkoZGlhbW9uZF90cmltKQpgYGAKCkxvYWQgdGhlIGRpYW1vbmRzLmNzdiBkYXRhIHNldCBhbmQgdW5kZXJ0YWtlIGFuIGluaXRpYWwgZXhwbG9yYXRpb24gb2YgdGhlIGRhdGEuIFlvdSB3aWxsIGZpbmQgYSBkZXNjcmlwdGlvbiBvZiB0aGUgbWVhbmluZ3Mgb2YgdGhlIHZhcmlhYmxlcyBvbiB0aGUgcmVsZXZhbnQgS2FnZ2xlIHBhZ2UKCldlIGV4cGVjdCB0aGUgY2FyYXQgb2YgdGhlIGRpYW1vbmRzIHRvIGJlIHN0cm9uZyBjb3JyZWxhdGVkIHdpdGggdGhlIHBoeXNpY2FsIGRpbWVuc2lvbnMgeCwgeSBhbmQgei4gVXNlIGdncGFpcnMoKSB0byBpbnZlc3RpZ2F0ZSBjb3JyZWxhdGlvbnMgYmV0d2VlbiB0aGVzZSBmb3VyIHZhcmlhYmxlcy4KYGBge3J9CmxpYnJhcnkoR0dhbGx5KQpnZ3BhaXJzKGRpYW1vbmRfdHJpbSkKYGBgCgpTbywgd2UgZG8gZmluZCBzaWduaWZpY2FudCBjb3JyZWxhdGlvbnMuIExldOKAmXMgZHJvcCBjb2x1bW5zIHgsIHkgYW5kIHogZnJvbSB0aGUgZGF0YXNldCwgaW4gcHJlcGFyYXRpb24gdG8gdXNlIG9ubHkgY2FyYXQgZ29pbmcgZm9yd2FyZC4KYGBge3J9CmRpYW1vbmRfY2FyYXQgPC0gc3Vic2V0KGRpYW1vbmQpWywgMTo4XQpgYGAKCgpXZSBhcmUgaW50ZXJlc3RlZCBpbiBkZXZlbG9waW5nIGEgcmVncmVzc2lvbiBtb2RlbCBmb3IgdGhlIHByaWNlIG9mIGEgZGlhbW9uZCBpbiB0ZXJtcyBvZiB0aGUgcG9zc2libGUgcHJlZGljdG9yIHZhcmlhYmxlcyBpbiB0aGUgZGF0YXNldC4gVXNlIGdncGFpcnMoKSB0byBpbnZlc3RpZ2F0ZSBjb3JyZWxhdGlvbnMgYmV0d2VlbiBwcmljZSBhbmQgdGhlIHByZWRpY3RvcnMgKHRoaXMgbWF5IHRha2UgYSB3aGlsZSB0byBydW4sIGRvbuKAmXQgd29ycnksIG1ha2UgY29mZmVlIG9yIHNvbWV0aGluZykuCmBgYHtyfQpnZ3BhaXJzKGRpYW1vbmRfY2FyYXQpCmBgYAoKClBlcmZvcm0gZnVydGhlciBnZ3Bsb3QgdmlzdWFsaXNhdGlvbnMgb2YgYW55IHNpZ25pZmljYW50IGNvcnJlbGF0aW9ucyB5b3UgZmluZC4KYGBge3J9CmRpYW1vbmRfY2FyYXQgJT4lCiAgZ2dwbG90KGFlcyh4ID0gY2FyYXQsIHkgPSBwcmljZSkpICsKICBnZW9tX3BvaW50KCkKYGBgCgpgYGB7cn0KZGlhbW9uZF9jYXJhdCAlPiUKICBnZ3Bsb3QoYWVzKHggPSBjdXQsIHkgPSBwcmljZSkpICsKICBnZW9tX2JveHBsb3QoKQpgYGAKCmBgYHtyfQpkaWFtb25kX2NhcmF0ICU+JQogIGdncGxvdChhZXMoeCA9IGNvbG9yLCB5ID0gcHJpY2UpKSArCiAgZ2VvbV9ib3hwbG90KCkKYGBgCgpgYGB7cn0KZGlhbW9uZF9jYXJhdCAlPiUKICBnZ3Bsb3QoYWVzKHggPSBjbGFyaXR5LCB5ID0gcHJpY2UpKSArCiAgZ2VvbV9ib3hwbG90KCkKYGBgCgoKU2hvcnRseSB3ZSBtYXkgdHJ5IGEgcmVncmVzc2lvbiBmaXQgdXNpbmcgb25lIG9yIG1vcmUgb2YgdGhlIGNhdGVnb3JpY2FsIHByZWRpY3RvcnMgY3V0LCBjbGFyaXR5IGFuZCBjb2xvciwgc28gbGV04oCZcyBpbnZlc3RpZ2F0ZSB0aGVzZSBwcmVkaWN0b3JzOiBJbnZlc3RpZ2F0ZSB0aGUgZmFjdG9yIGxldmVscyBvZiB0aGVzZSBwcmVkaWN0b3JzLiBIb3cgbWFueSBkdW1teSB2YXJpYWJsZXMgZG8geW91IGV4cGVjdCBmb3IgZWFjaCBvZiB0aGVtPwpgYGB7cn0KbGVuZ3RoKHVuaXF1ZShkaWFtb25kX2NhcmF0JGN1dCkpCmxlbmd0aCh1bmlxdWUoZGlhbW9uZF9jYXJhdCRjb2xvcikpCmxlbmd0aCh1bmlxdWUoZGlhbW9uZF9jYXJhdCRjbGFyaXR5KSkKYGBgCgpVc2UgdGhlIGR1bW15X2NvbHMoKSBmdW5jdGlvbiBpbiB0aGUgZmFzdER1bW1pZXMgcGFja2FnZSB0byBnZW5lcmF0ZSBkdW1taWVzIGZvciB0aGVzZSBwcmVkaWN0b3JzIGFuZCBjaGVjayB0aGUgbnVtYmVyIG9mIGR1bW1pZXMgaW4gZWFjaCBjYXNlLgpgYGB7cn0KbGlicmFyeShmYXN0RHVtbWllcykKZGlhbW9uZF9jYXJhdCA8LSBkdW1teV9jb2xzKGRpYW1vbmRfY2FyYXQsIAogICAgICAgICAgICAgICAgICAgICAgc2VsZWN0X2NvbHVtbnMgPSBjKCJjdXQiLCAiY29sb3IiLCAiY2xhcml0eSIpLCAKICAgICAgICAgICAgICAgICAgICAgIHJlbW92ZV9maXJzdF9kdW1teSA9IFRSVUUpCmBgYAoKR29pbmcgZm9yd2FyZCB3ZeKAmWxsIGxldCBSIGhhbmRsZSBkdW1teSB2YXJpYWJsZSBjcmVhdGlvbiBmb3IgY2F0ZWdvcmljYWwgcHJlZGljdG9ycyBpbiByZWdyZXNzaW9uIGZpdHRpbmcgKHJlbWVtYmVyIGxtKCkgd2lsbCBnZW5lcmF0ZSB0aGUgY29ycmVjdCBudW1iZXJzIG9mIGR1bW15IGxldmVscyBhdXRvbWF0aWNhbGx5LCBhYnNvcmJpbmcgb25lIG9mIHRoZSBsZXZlbHMgaW50byB0aGUgaW50ZXJjZXB0IGFzIGEgcmVmZXJlbmNlIGxldmVsKQpGaXJzdCwgd2XigJlsbCBzdGFydCB3aXRoIHNpbXBsZSBsaW5lYXIgcmVncmVzc2lvbi4gUmVncmVzcyBwcmljZSBvbiBjYXJhdCBhbmQgY2hlY2sgdGhlIHJlZ3Jlc3Npb24gZGlhZ25vc3RpY3MuCmBgYHtyfQpwcmljZV9jYXJhdF9tb2RlbCA8LSBsbShwcmljZSB+IGNhcmF0LCAKICAgICAgICAgICAgICAgICAgICAgICAgZGF0YSA9IGRpYW1vbmRfY2FyYXQpCnN1bW1hcnkocHJpY2VfY2FyYXRfbW9kZWwpCnBsb3QocHJpY2VfY2FyYXRfbW9kZWwpCmBgYAoKUnVuIGEgcmVncmVzc2lvbiB3aXRoIG9uZSBvciBib3RoIG9mIHRoZSBwcmVkaWN0b3IgYW5kIHJlc3BvbnNlIHZhcmlhYmxlcyBsb2coKSB0cmFuc2Zvcm1lZCBhbmQgcmVjaGVjayB0aGUgZGlhZ25vc3RpY3MuIERvIHlvdSBzZWUgYW55IGltcHJvdmVtZW50PwpgYGB7cn0KZGlhbW9uZF9jYXJhdCA8LSBkaWFtb25kX2NhcmF0ICU+JQogIG11dGF0ZShwcmljZV9sb2cgPSBsb2cocHJpY2UpKSAlPiUKICBtdXRhdGUoY2FyYXRfbG9nID1sb2coY2FyYXQpKQpgYGAKCmBgYHtyfQpsb2dfbG9nX3ByaWNlX2NhcmF0IDwtIGxtKHByaWNlX2xvZyB+IGNhcmF0X2xvZywgCiAgICAgICAgICAgICAgICAgICAgICAgICAgZGF0YSA9IGRpYW1vbmRfY2FyYXQpCnN1bW1hcnkobG9nX2xvZ19wcmljZV9jYXJhdCkKcGxvdChsb2dfbG9nX3ByaWNlX2NhcmF0KQpgYGAKCmBgYHtyfQpsb2dfcHJpY2UgPC0gbG0ocHJpY2VfbG9nIH4gY2FyYXQsIAogICAgICAgICAgICAgICAgZGF0YSA9IGRpYW1vbmRfY2FyYXQpCnN1bW1hcnkobG9nX3ByaWNlKQpwbG90KGxvZ19wcmljZSkKYGBgCgpgYGB7cn0KbG9nX2NhcmF0IDwtIGxtKHByaWNlIH4gY2FyYXRfbG9nLCAKICAgICAgICAgICAgICAgIGRhdGEgPSBkaWFtb25kX2NhcmF0KQpzdW1tYXJ5KGxvZ19jYXJhdCkKcGxvdChsb2dfY2FyYXQpCmBgYAoKTGV04oCZcyB1c2UgbG9nKCkgdHJhbnNmb3JtYXRpb25zIG9mIGJvdGggcHJlZGljdG9yIGFuZCByZXNwb25zZS4gTmV4dCwgZXhwZXJpbWVudCB3aXRoIGFkZGluZyBhIHNpbmdsZSBjYXRlZ29yaWNhbCBwcmVkaWN0b3IgaW50byB0aGUgbW9kZWwuIFdoaWNoIGNhdGVnb3JpY2FsIHByZWRpY3RvciBpcyBiZXN0PyBbSGludCAtIGludmVzdGlnYXRlIHIyIHZhbHVlc10KYGBge3J9CmxvZ19sb2dfY3V0IDwtIGxtKHByaWNlX2xvZyB+IGNhcmF0X2xvZyArIGN1dCwgCiAgICAgICAgICAgICAgICAgIGRhdGEgPSBkaWFtb25kX2NhcmF0KQpsb2dfbG9nX2NvbG9yIDwtIGxtKHByaWNlX2xvZyB+IGNhcmF0X2xvZyArIGNvbG9yLCAKICAgICAgICAgICAgICAgICAgZGF0YSA9IGRpYW1vbmRfY2FyYXQpCmxvZ19sb2dfY2xhcml0eSA8LSBsbShwcmljZV9sb2cgfiBjYXJhdF9sb2cgKyBjbGFyaXR5LCAKICAgICAgICAgICAgICAgICAgZGF0YSA9IGRpYW1vbmRfY2FyYXQpCnN1bW1hcnkobG9nX2xvZ19jdXQpCnN1bW1hcnkobG9nX2xvZ19jb2xvcikKc3VtbWFyeShsb2dfbG9nX2NsYXJpdHkpCmBgYAoKSW50ZXJwcmV0IHRoZSBmaXR0ZWQgY29lZmZpY2llbnRzIGZvciB0aGUgbGV2ZWxzIG9mIHlvdXIgY2hvc2VuIGNhdGVnb3JpY2FsIHByZWRpY3Rvci4gV2hpY2ggbGV2ZWwgaXMgdGhlIHJlZmVyZW5jZSBsZXZlbD8gV2hpY2ggbGV2ZWwgc2hvd3MgdGhlIGdyZWF0ZXN0IGRpZmZlcmVuY2UgaW4gcHJpY2UgZnJvbSB0aGUgcmVmZXJlbmNlIGxldmVsPyBbSGludHMgLSByZW1lbWJlciB3ZSBhcmUgcmVncmVzc2luZyB0aGUgbG9nKHByaWNlKSBoZXJlLCBhbmQgdGhpbmsgYWJvdXQgd2hhdCB0aGUgcHJlc2VuY2Ugb2YgdGhlIGxvZyhjYXJhdCkgcHJlZGljdG9yIGltcGxpZXMuIFdl4oCZcmUgbm90IGV4cGVjdGluZyBhIG1hdGhlbWF0aWNhbCBleHBsYW5hdGlvbl0KYGBge3J9CnVuaXF1ZShkaWFtb25kX2NhcmF0JGNsYXJpdHkpCmBgYApJMSBpcyB0aGUgcmVmZXJlbmNlIGxldmVsCmxvZ19wcmljZSBvZiBkaWFtaW5kIGlzIDEuMTE0NjI1IGZvciBjbGFyaXR5IFNJMSB0aGFuIGZybyByZWZlcmVuY2UgY2xhcml0eSBJMQpmb3IgYSBmaXhlZCBsb2coY2FyYXQpCgpsb2cgb2Ygc29tZXRoaW5nIDwgMSBpcyBuZWdhdGl2ZSAtPiBwcmljZSBpcyBsb3dlciB0aGFuIHJlZmVyZW5jZQpsb2cgb2Ygc29tZXRoaW5nID4gMSBpcyBwb3NpdGl2ZSAtPiBpbmNyZWFzZSBpbiBwcmljZSB0byByZWZlcmVuY2UKCgoyIEV4dGVuc2lvbgpUcnkgYWRkaW5nIGFuIGludGVyYWN0aW9uIGJldHdlZW4gbG9nKGNhcmF0KSBhbmQgeW91ciBjaG9zZW4gY2F0ZWdvcmljYWwgcHJlZGljdG9yLiBEbyB5b3UgdGhpbmsgdGhpcyBpbnRlcmFjdGlvbiB0ZXJtIGlzIHN0YXRpc3RpY2FsbHkganVzdGlmaWVkPwpgYGB7cn0KbG9nX2xvZ19jYXJhdGNsYXJpdHkgPC0gbG0ocHJpY2VfbG9nIH4gY2FyYXRfbG9nICsgY2xhcml0eSArIGNhcmF0X2xvZzpjbGFyaXR5LCAKICAgICAgICAgICAgICAgICAgZGF0YSA9IGRpYW1vbmRfY2FyYXQpCnN1bW1hcnkobG9nX2xvZ19jYXJhdGNsYXJpdHkpCmBgYApOb3QganVzdGlmaWVkLCAKUl9zcXVhcmVkIHN0aWxsIHByZXR0eSBtdWNoIHRoZSBzYW1lIAoKRmluZCBhbmQgcGxvdCBhbiBhcHByb3ByaWF0ZSB2aXN1YWxpc2F0aW9uIHRvIHNob3cgdGhlIGVmZmVjdCBvZiB0aGlzIGludGVyYWN0aW9uCgpgYGB7cn0KY29wbG90KHByaWNlX2xvZyB+IGNhcmF0X2xvZyB8IGNsYXJpdHksIAogICAgICAgcGFuZWwgPSBmdW5jdGlvbih4LCB5LCAuLi4pewogICAgICAgICBwb2ludHMoeCwgeSkKICAgICAgICAgYWJsaW5lKGxtKHkgfiB4KSwgY29sID0gImJsdWUiKQogICAgICAgfSwgCiAgICAgICBvdmVybGFwID0gMC4yLAogICAgICAgZGF0YSA9IGRpYW1vbmRfY2FyYXQpCmBgYAoKCg==